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To3 Distribution 


Te He Van Vieck 


From 
Date: Ci/13/75 


Subject? Unattenaec Jperation of Muitics 


Several Multics instaitations nave been atfenoting fo run 
their systems without an operator present. Although tnis is not 
an advertisea feature of Multics, these sites nave done pretty 
well by leaving tne system running when the operators go off 
Snifts with the unoerstanazing fnat fne system will be availanole 
without backup or tape mounting facilities, ana that if fhe 
system crashes it will stay down until an operator comes in. 
This memoranscum aescribes changes fo tne supervisor ana sunport 
proceaures which woula make unattended operation easier anid more 
reliaple. With some very Simple changes, unattended moce dDecomes 
usable almost immeaiately; later changes which make adcdaitional 
improvements in unattended mogce are also descridea. 


AUTOMATIC RESTART AFTER A SYSTEM CRASH 

When Muitics crasnes, the operator usually brings the system 
back up agains Aimost ali of tne steos which the operator takes 
can be automatec. These steps are’ 

i. Determine fnatf the system nas crasned. 

22 Invoke the “CRASH™ runcom.e 
3. Invoke fhe LO0355 ana BOOT commands. 
4. Reply “startuo” to the Initializer process. 
5. Start tne Backus and I0 daemons. 


Existing facilities are sufficient tor automating many of these 
steps. Oniy minor changes neec de made to fix tne others. 


Betecmining khen the System Has Crasnedgd 


Crashes will be detected when the sysfem returns fo 80S from 
Multics operation. There are some cases of system crash such as 
power failure, idte loop, or hung Initializer process which do 
not return to 80S these cases are not hanctes py The new 
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scheme, but ere fortunately fairly rare. There are also cases in 
which the system returns to BOS buf nas not crasned, for instance 
after a normal sSnutaown, or in response fo the system _control_ 
"bos" command}; moaifications will oe made to the supervisor and 
to BOS to cetect tnese caseSe 


We will set up one word of storage in tne 60S foenola for 
intercommunication cetween Multics ana BOS. This wora will o€ 
used 4S 4 set of 36 Switches whicn will be set eitner ty a 39S 
commana or py a Mulfics privilegeo cali. One of These switches 


wild be reserves to mean “automatic resoot moace is on} this 
Switch will be turnea ON by a Multics commana, and wiil be turned 
OFF by Multics command, BOS command, or by automatic crash-tooo 
detecting coce in Multics startup. Another switch will mean “the 


System crashea"? ftnis Switch will be sef ON at boot time and 
reset by normal shutuown ana calls to pmut$bos_anc_return. 


When the systen returns to BOS», any runcom whicn container 


the B8COT command will continue on to its next stetemerft. Tnis 
statement wilt be a test of the “crasned"™ switch in tne stanaara 


runcom, to discover whetner the System crashed or Shut dJonn 
normally. 


It is @lready possible fo distinguish between The case of 3 
Return=To-BOS causeac py the Multics soffware ana one causea oy an 
XED 4OG0 from tne processor panel. Since the latter case implies 
that someone is present in the computer room, it shoula not oe 
assumec that this action is a “crash.” Similarty, tne setting of 


the “crashec™ switch will allow BOS fo distinguish between calls 
fo omut$po0s and pmut$o0s_and_return$ the secona call is made 
onty in responses to manual intervention and so shoula rot poe 


detectéd as acrasn.e 


Some chenges could be made in the scheduler to vsetect icle 
loops, anc it sShoula ode possinie to set up a aetector for tne 
cases in whicn tne Tritializer process is nung: buf these are 
refinements which car ove proposed separately. A special commana 
to cause the system to crash should also be constructed which can 
be used by trustworthy installation personnel in cases where they 
notice that the system shoulo oe restarted. (Special access 
orivileye will be orovided on the gate wnicn performs tnis 
functione) 


Invoking the CRASH Kuncom 


When the operatfo™ aiscovers 3 system crashes he usually types 
CRASH immediately, unless it is oovious that a haraware fai lure 
will prevent recovery from workings Tne usual recovery 
oroc:agures consist of the following commanas: 


FOUMP (unless inhibited) 
POS55 
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BLAST CRASH 

ESO 

SALVY (if necessary) 
LO355 

BOCT 


The actual runcoms wild of course be much more complicated, in 
fhe styie of MOSN=-241. Various switches will be sdaéea to aliow 
the operator to request that the system pause before booting ana 
salvaging, after crashnings and so on. Similarty, moaes to 
suppress crash dumps altogether or to take orinter anc/or tape 
dumps in adcition to the FOUMP will be definec. 


In order to prevent the system from cycling in a tight looo 
of boot = crash immediately - recover - boots several small 
modifications will be made to tne scheme so far outlinec, so tnat 
tne default affer an automatic redoot is to turn off automatic 
mooe, until the answering service uefermines that the system is 
truly up, ana is 0f in a crash loop. The acvantage to tnis 
approach is that tne full Multics environment is availaole for 
programs which try to cetecf the loop. Automatic rebooting will 
be discontinued after N crashes after automatic mode was antered. 
It will also be furned off if fhere are more than M crasnes in K 
minutes: anG admin commands will be acdes fo the Initializer to 
altow the setting of tnese parameters. 


Invoking £0355 any BOOT 


Tre restarting of tne system will be done as 4 part of fha 
rercat toon «nich the system will be frapoec in, unless = “tsnu3l 
inferventfior™ switch is set or automatic mode has seen turned 
oft. Tape-cositloning operations may nave to oe insertes in tne 
runcom to position tne unified bootload tape (see MT5-1i35). Tne 
drive on which the unified tape Is mounted must be uravailantie 
for use by the supervisor tape-~assigment coces so fhat The tape 
can remain permanently mounfea. 


Repivyirg to Initializec 


When the Initializer nas completed hardcore initialization, 
it enters tne ringe-1 environment ana waits for an input command 
from the master console which tells if whather to do a cola coat, 
enter 30S, or start up the system. Tne system _startfup_ program 
must be modified to check the switcnes in the toenola ana to 
manufacture a “startup” command if “automatic reoo00TtT™" mode is ON. 
Tne ring-1 enviroment will then proceed To call 
system_control_$startup for answering service initialization. 
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siscting ihe Daemons 


Most instailations nave their system_start_upeec arranged so 
at the csemons ao not automatically start running, but only 
come to commanc level, so tnat ooerstors may modify oarameters or 
choose not to run the daemons at all. B3ut when mo operator is 
oresent, the daemons will then hang. A Simole solution to this 
oroblem is to have ruining unsattendcec imply running without any 
backup or [0 daemons. Alternativelys tne standara action taken 
by system_start_up.ec could o€ ta inciude the startup of the 
aaemonSe Neither 9 of these choices is attractive. The best 
solution is to make an active function availadfie whicn will 
return “true” wner the system is in “automatic réboot™ mode, so 
that the system_start_upeec can take a default action wren fhe 
System is unattended, and oftnerwise wait for tne operator, 


FACILITIES AVAILABLE IN UNATTENDED MODE 


Tne other auties of the system ovderatfor relate to the 
tending of the porinter and the mounting of tapes. If we provide 
a oer-system switch which tells wnether tne system is unattended, 
it will be fairly easy to cause tne system software to reject all 
requests whicn cannot be handied automatically, and to suppress 
messages which nave no function except to prompt the operstor. 
For examples user tape=mount requests which cannot ve satisfied 
without human intervention should be rejected by the 
tTape-management software. Simitarlyy, there seems to be little 
harm in attempting to run the IO daémor, but wnen the printer 
paper Jams or runs out, The message describing this situation 
neec be printeac ucnty onces fhe device griver shoula then wait 
quietiy for someone to fix ifs problem, 


Once “unattenaed mode is set, user programs shoulda be apnle 
to determine that there iS no Operator presents, and when onc will 


be available, if This is Known. A flag in whotabd will pe used to 
inouicate tne system moce,y and a few more words to show tire time 
this moae will change. The system mode and scnedule data will oa 


set oy issuing an Initializer process command. (Pernaps adding 
and aeleting an operator shnoulu = be thought of as daynamic 
reconfiguration.) Wnen the system changes from unattended to 
attended. the flag in whotabd may be used, at installation option, 
for some other information such as the initials of the cnief 
ogpoeretor on aut ye Tne subroutine system _info_ ana the 
commana/active function “system" will oe upcatec to return these 
New ifemse 


BACKUP 


Tne major problen wifhn running The system in unattendesa mode 
is that the incremental backup daemon will eventually fill its 


outolt backup cumo tape. Wher, it woess if wili reguest another 
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taoe number from the operator. Tne reply to this guestion coula 


have been Issued in advance, by a call to “reply” in 
system_start_upeec but the daemon will then take this string and 
call the fape software to nave a tope mounted. Our suggestion so 


far has been to refuse to mount the tape. 


It will not be difficult to chanjye to the tap=-assignment 
packege to make it work Gifferently for the Backup asemon if tne 
system is unattendeae When the operator is olacing the system 
into unattended mod2,; he will toad a!! availabie arives with 
blank backuo Tapes, ana instruct tne taper-assigment software 
assign these fapes one by one to the daemon as neesedy. This 
requires that the tape mount software perform a slightly 
different action when assigning a tapes that the tape DIM not 
dump a tape aft mount time if it is already mounted, and also that 
the Tapes which are not at !oad point when Multics is booted 
shoula pe (file marked ana) unloadgec. . 


CONFIGURATION 


The bug which drevents the system from coming up witn more 
tnan one CPU should bea fixed, so that 23 large system which 
crashes and successfully reboots is not limited to one CPU. 
There will be caSeS»s Nowevers when the supervisor ceconfigures 
some system resources due to errors. (Tnis is done with bulk 
store records currently, and will be done witn memory in tha 
futuree With the new Storage system, disk drives may aiso be 
treated This way.) If a resource has peen dropped due to error, 
an automatic restart should know not to try to use the device 
again. . 


GENERALIZATIONS OF THIS SCHEME 


Unattenaceo mode shoula probably be implementec as a set of 
Switches, one for aufomatic rebootings anotner for tape-drive 
nand!iings and so fortn.» Some installations may wish to provide 
weekenc service with no operator in tne computer room but with an. 
attendea operator conrsole distant from tne macnine room (via the 
message coorainator). There are alSo devices on the markef wnich 
automatically find @ tape anw !04aug it onto sa tape sarive in 
response to a software order; so that an unattendges system mignt 
be able to answer most tape mount requests. Another possibility 
which might be usec oy Some sites is tnat one or more [0 aevices 
(card readers, tape drives, printers) might be located in an area 
accessinjJe to trustea users, who would be able to issue commanas 
to attacn ana use these wevices even when no operator is 
attenaging tne main conpufer. 


Tne unattendea moaes snould orobadly o€ implementexc so tnat, 
when a problem occurs, it will be conveniant for someone to come 
ins fix tne problem, ard let the system continue unattended. For 
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example, if a orinter runs out of caper, and a system programmer 
nappens to pass the machine, he snould be ante to insert a fresh 
Dox of paper ana allow the caemon to continue orintings witnout 
Naving To issue any Irnitieatizer commanas. or having to affect 
unaftteded mode wifh respect to fapes. 


